perm filename MOLGEN.GLO[HPP,DBL] blob sn#188021 filedate 1975-11-25 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00001 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 ENDMK
CāŠ—;
āˆ‚25-NOV-75  1431	FTP: host SUMEX
Date: 25 NOV 1975 1428-PST
From: STEFIK at SUMEX-AIM
Subject: <MOLGEN>sum.hpp
To:   dbl at SU-AI

esDoug,
	That is the name of the file which contains my first attempt
at a summary format for the conference.  Give me your reactions.
Also - note that the Glossary is not written in the form of a
dictionary, but rather as a short essay which defines a few technical
terms.  My feeling is that it conveys more than a dictionary, altho
I would feel more comfortable if the technical terms were placed in
italics when they were being defined. (Alas our line printer .. and
"" and ALL CAPS ideas are a bit much.  Maybe I should boldface it
using PUB or something. )
				Mark


Copy of <MOLGEN>sum.hpp follows: --------------------------
---------------------------------------------------------------------
Overview of the MOLGEN Project

	The proposed MOLGEN program is being designed to
reason with genetic structures and experimental constraints.
It is somewhat analogous to the existing CONGEN program
which deals with chemical structures and constraints.
The common basic premise in the two programs is that the
description level of "structures" is both rich enough for
the the human expert's interest and limited enough to permit
reasonable program manipulation.  MOLGEN, however, is expected
to perform experimental planning using a wide variety 
of biological knowledge.

	The broad scope of the MOLGEN program would encompass
"legal moves" on changes to DNA. For example,

	1. GIVEN a structural hypothesis,
	   PLAN experiments and procedures which would verify
	   the hypothesis. (structural analysis).

	2. GIVEN starting and target DNA structures,
	   PLAN experiments and procedures which will transform
	   the initial structure to the final structure without
	   excessive loss in the experiment. (structural synthesis).

	3. GIVEN an initial DNA structure and a set of transformations
	   which were carried out, VERIFY the correctness of
	   the hypothesized final structures and show any 
	   other possible final structures. (proof checker).
---------------------------------------------------------------
Documentation for MOLGEN

	The <MOLGEN> account has been setup as a repository for
programs and documentation for the MOLGEN project. The TENEX
file <MOLGEN>MESSAGE.TXT is used to give brief summaries
for the detailed documentation files.  To use this file, one may
log into his own area and invoke the BANANARD program. Use the
Read command to get the <MOLGEN>MESSAGE.TXT file and an
Inverse command to get a listing of recent documentation
file names.  The Type command on the corresponding
message number gives a brief summary.  If you decide that the
file is interesting, you may list it out from the EXEC.  Files
of most probable interest might include:

<MOLGEN>SUM.SEPT5A	Overview of Hierarchical Planning 

<MOLGEN>SUM.SEPT23	Encoding of Enzyme mechanisms
		        (Not quite current.)
<MOLGEN>SUM.OCT10	DNA representation Data Structures

<MOLGEN>SUM.OCT29	Enzyme Action Simulator Design Overview


------------------------------------------------------------
Technical Terminology from Molecular Genetics

	Deoxyribonucleic acid (DNA) is an organic molecule which
carries the genetic instructions for building and maintaining
living organisms. Its chemical structure includes a sugar
backbone formed by deoxyribose molcules which are linked together
via phosphates to form long polymers. The carbons of each sugar
are numbered 1' thru 5' and the phosphates are connected to the
3' and 5' carbons.  Since 5' carbons of one sugar are always
connected to the 3' of the next sugar on the backbone, the
entire backbone takes on an orientation - having a 3' and
a 5' end. To each sugar is attached a base at the 1' carbon.
For DNA, there are four types of bases: adenine (A), guanine (G)
thymine (T), and cytosine (C). The bases form
a sort of genetic alphabet, the sentences of which are spelled
out by their ordering on the molecules of DNA. 
The unit consisting of one sugar molecule and its attached base
is called a nucleotide.  An unattached nucleotide is termed a 
mononucleotide.  The terms dinucleotide and trinucleotide
have analogous meanings and oligonucleotide means a short 
sequence of nucleotides - generally less than fifty nucleotides.
Human DNA consists of about 5 billion nucleotides and would
be about 2 meters long if it were all stretched out. (Remember
that this molecule is contained in each cell of the body.) The
DNA for a virus consists of only a few thousand nucleotides.
An important property of DNA is that it is almost always
double stranded - that is, that it has two antiparallel
backbones and two bases for each unit of length. In
fact, the bases are invariably paired - A with T and G with C
along the length.  Topologically this looks like a very long
ladder with the sugars forming the sides and the paired bases
forming the rungs.  This pairing occurs because of a special
way the bases fit together and is results
in a redundancy in the DNA molecule which both makes
reproduction possible and allows for repair of damage
to the DNA caused by UV light or physical stress.
	Antiparallel sequences of DNA with corresponding 
complementary bases are called homologous sequences. When
double stranded DNA is subjected to increased pH, or
high temperatures, or lower ionic strength it denatures.
That means that the homologous sequences come apart and the
DNA becomes single stranded. By reversing the conditions,
the DNA can be caused to renature or become double stranded again. 
	Enzymes are substances used by living organisms which
are used to catalyze reactions. Many enzymes can be used
to carry out transformations on DNA. The term substrate
is used to refer to the substance that an enzyme
will act upon and transform into the product of the reaction.
A nuclease is an enzyme which will cut DNA into smaller pieces.
An exonuclease will cut from an end and an endonuclease
will cut DNA some place in the middle of a strand. A restriction
enzyme is an endonuclease which recognizes a specific
base sequence and cuts there. Such an enzyme is usually paired
in biological systems with a modification enzyme which modifies
(example methylates) the bases just enough to prevent their
recognition by the restriction enzyme. Restriction/modification
systems are used by cells to destroy invading foreign
DNA. In such systems, the host DNA is modified and the invading
DNA is (hopefully) destroyed. Ligase is an enzyme which can
seal nicks in DNA. Polymerase is an enzyme which can be used
to copy DNA.  Thousands of different kinds
of enzymes requiring particular conditions and substrates
have been cataloged by molecular biologists. 
	Chromatography is a generic term for any of the methods
of separating out the various components of a mixture.  Ion
exchange chromatography can be used to separate nucleotides
from nucleosides. (A mononucleoside is a mononucleotide whose
phosphate group has been lost and replaced by a hydroxyl group.)
Short oligonucleosides can be separated into their
differing lengths by gel electrophoresis.  Single stranded
DNA can be separated from double stranded DNA by a
chromatographic technique using the matrix Hydroxy appetite. 

	E. Coli and B. Subtilus are two strains of bacteria
which grown on agar gel which are used for most experiments
in molecular biology.  The genetic map, or location of
genes on the DNA, is better understood for these organisms
than for any other. 

------------------------------------------------------
A.I.  Methodologies and Outstanding Questions

	Two major AI issues are being and will continue to be faced 
during the development of the MOLGEN project, representations of a wide
variety of knowledge types, and intelligent planning in a world of many
possibilities.
	Five major types of information need to be represented by 
MOLGEN:
	1.  DNA structures themselves--we have chosen a list represen-
tation with individual cells storing knowledge about a discrete 
structural unit of the DNA molecule.  Interesting questions of "fuzzy"
data have yet to be faced because geneticists themselves usually have
very incomplete information about the molecules they work with.
	2.  Enzymatic chemistry--enzymes are the working tools of
molecular genetics and their operations and constraints must be known.
We envision a rule-based data file storage, modelling thee
production system approach used by MYCIN.
	3.  Experimental techniques--laboratory methods like
electron microscopy and chromatographic separation are tools of the 
geneticist also--just how to store this knowledge is yet uncertain.
	4.  Mathematical models for various processes--certain
physical processes such as de/renaturation are best represented as
formal mathematical models, i.e. as procedures which use stochastic
methods to modify our model of dna structures.
	5.  Meta-rules for planning--we need some good way of repre-
senting higher-order rules for global planning strategies(more on
this below)--a production system-like approach is probable, but
considerable study must be made of the way geneticists operate to make
a final determination.

	The second major issue is the proper way to produce total
experimental plans, i.e. how to apply the various types of knowledge
stored by the program.  Although detailed plans have not yet been 
finalized, we imagine a hierarchical planning approach where global
strategies are chosen from a list of alternatives and planning is done 
on levels of ascending detail until a final feasible plan is discovered.
The approach is similar to that adopted by some in the domain of robot
planning (e.g. sacerdoti and ABSTRIPS), and will require careful dynamic
determination of just what factors are most crucial in a given experi-
ment.  We expect this question of weighting priorities to be
one of the most interesting AI issues to be faced during the development
of MOLGEN.

-------